Aim: To plot and identify the commercial markets using point of interest (POI)

This task requires to create clusters of distinct commercial centers or markets using points of interest data of a city (the city could be yours). Points of interest (POI) data provides location information of different places along with their defining tags like school, type of outlets, type of building, etc.

Importing necessary libraries

In [167]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from matplotlib.pyplot import figure
%matplotlib inline

Importing Geopandas

A Library built on the python pandas and some libarires to work with the geospatial data

In [168]:
import geopandas as gpd

Reading the Geojson file exported from the Open Source Maps with the help of OverpassTurbo

The area is North Delhi

In [169]:
df = gpd.read_file("rohini.geojson")

Used Overpass Turbo and requested query to extract various types of POI for this. Types of POIs are mentioned in the query script.

In [170]:
df.head()
Out[170]:
id @id amenity atm brand:wikidata brand:wikipedia name name:en operator website ... unisex addr:district addr:subdistrict branch:type drink:sugarcane_juice air_conditioning drive_through studio healthcare geometry
0 node/280741143 node/280741143 bank yes Q2003549 en:Axis Bank Axis Bank Axis Bank Canara Bank https://www.axisbank.com ... None None None None None None None None None POINT (77.19423500000001 28.64725)
1 node/355436037 node/355436037 atm None None None ICICI Bank None None None ... None None None None None None None None None POINT (77.1723786 28.6458869)
2 node/355436042 node/355436042 fast_food None None None Dominos Pizza None None None ... None None None None None None None None None POINT (77.1722681 28.6457998)
3 node/459771176 node/459771176 cinema None None None Fun Cinemas, CRM, Shahdara None Fun Multiplex Pvt Ltd None ... None None None None None None None None None POINT (77.30198 28.656726)
4 node/496457107 node/496457107 bus_station None None None None None None None ... None None None None None None None None None POINT (77.2516076 28.6108671)

5 rows × 126 columns

In [171]:
df.shape
Out[171]:
(1123, 126)

Extrating the Longitude and Latitude coordinates from the 'geography' column

In [173]:
df['Long'] = df['geometry'].x
df['Lat'] = df['geometry'].y

Converting the Coordinates into an array of geo pairs

In [174]:
coordinates = np.array([[df['Lat'],df['Long']]])

Visualization:

Folium, a python library based on Leaflet, a javascript interactive library used here to plot and visualise the data points on the map.

Tried Various Map packages such as Basemap, Geopy but didnt get satisfactory results

In [189]:
import folium
from folium import plugins
from folium.plugins import MarkerCluster
In [190]:
#Initiating a folium map instance of North Delhi Area
m = folium.Map([ 28.67304, 77.19767], zoom_start=12)
m
Out[190]:
In [191]:
#Setting the Map to show the data points in circular markers

for index, row in df.iterrows():
    folium.CircleMarker([row['Lat'], row['Long']],
                        radius=5,
                        popup=row['name'],
                        fill_color="#3db7e4", # divvy color,
                       ).add_to(m)
   
 #plotting the data points on map
m
Out[191]:

Analyzing the density of POI in areas of Map

In [192]:
# adding heatmap to our folium map to show the density of the data points

m.add_child(plugins.HeatMap(stationArr, radius=13))
m
Out[192]:

Clubbing all the nearby POI into groups/clusters using a folium clustering plugin

Go on and play with the map to check out what shops are there in markets with high density

In [193]:
#Zipping the coordinated in a list
locations = list(zip(df.Lat, df.Long))

#Creating the icon for the data points
icons = [folium.Icon(icon="shop", prefix="fa") for _ in range(len(locations))]

cluster = MarkerCluster(locations=locations, icons=icons)
m.add_child(cluster)
m
Out[193]:

Clustering with Machine Learning

Density clustering algorithms use the concept of reachability i.e. how many neighbors has a point within a radius. DBScan is more lovely because it doesn’t need parameter, k, which is the number of clusters we are trying to find, which KMeans needs. When you don’t know the number of clusters hidden in the dataset and there’s no way to visualize your dataset, it’s a good decision to use DBScan. DBSCAN produces a varying number of clusters, based on the input data.

In [157]:
from sklearn.cluster import DBSCAN
import sklearn.utils
from sklearn.preprocessing import StandardScaler

#Standardising the data for fitting
pairs= df[['Lat', 'Long']]
pairs = StandardScaler().fit_transform(pairs)

db = DBSCAN(eps=0.3, min_samples=7).fit(pairs)
labels = db.labels_
print (labels[500:560])
df["Market"]=labels


#Ignoring the data points outside the labels

realClusterNum=len(set(labels)) - (1 if -1 in labels else 0)
clusterNum = len(set(labels))
[0 2 2 0 0 0 0 0 0 0 0 1 1 1 1 1 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1 1 1 1 0
 0 0 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0]
In [159]:
set(labels)
Out[159]:
{-1, 0, 1, 2, 3}
In [166]:
#Id are catogrized into different Markets and a new Market Column is added in dataframe

df.head()
Out[166]:
id @id amenity atm brand:wikidata brand:wikipedia name name:en operator website ... drink:sugarcane_juice air_conditioning drive_through studio healthcare geometry Long Lat Clus_Db Market
0 node/280741143 node/280741143 bank yes Q2003549 en:Axis Bank Axis Bank Axis Bank Canara Bank https://www.axisbank.com ... None None None None None POINT (77.19423500000001 28.64725) 77.194235 28.647250 0 0
1 node/355436037 node/355436037 atm None None None ICICI Bank None None None ... None None None None None POINT (77.1723786 28.6458869) 77.172379 28.645887 0 0
2 node/355436042 node/355436042 fast_food None None None Dominos Pizza None None None ... None None None None None POINT (77.1722681 28.6457998) 77.172268 28.645800 0 0
3 node/459771176 node/459771176 cinema None None None Fun Cinemas, CRM, Shahdara None Fun Multiplex Pvt Ltd None ... None None None None None POINT (77.30198 28.656726) 77.301980 28.656726 1 1
4 node/496457107 node/496457107 bus_station None None None None None None None ... None None None None None POINT (77.2516076 28.6108671) 77.251608 28.610867 0 0

5 rows × 130 columns

Things Left

1. Plotting the DBSCAN clusters in Folium or in some interactive Map package

2. Try to include Way and Relation data types of GIS into the clustering method

In [ ]: